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Abstract. An important issue in cosmology is reconstructing the effective dark energy 
equation of state directly from observations. With few physically motivated models, future 
dark energy studies cannot only be based on constraining a dark energy parameter space, 
as the errors found depend strongly on the parametrisation considered. We present a new 
non-parametric approach to reconstructing the history of the expansion rate and dark energy 
using Gaussian Processes, which is a fully Bayesian approach for smoothing data. We present 
a pedagogical introduction to Gaussian Processes, and discuss how it can be used to robustly 
differentiate data in a suitable way. Using this method we show that the Dark Energy 
Survey - Supernova Survey (DES) can accurately recover a slowly evolving equation of state 
to = ±0.05 (95% CL) at z = and ±0.25 at z = 0.7, with a minimum error of ±0.025 at 
the sweet-spot at z ~ 0.16, provided the other parameters of the model are known. Errors 
on the expansion history are an order of magnitude smaller, yet make no assumptions about 
dark energy whatsoever. A code for calculating functions and their first three derivatives 
using Gaussian processes has been developed and is available for download. 
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1 Introduction 

A key problem in cosmology lies in determining whether dark energy is a cosmological con- 
stant and if not, then constraining how it evolves with cosmic time. Previously, this has been 
approached in a model-building way - i.e., constraining specific models of dark energy, such 
as quintessence models or modifications to general relativity [1]. It has been a significant 
problem to produce well motivated models which are not ad hoc in some way. In this sense 
they tend to have functional degrees of freedom - such as the quintessence potential or the 
'/' in f{R) theories of gravity - which have to be constrained via observations. Constraints 
on dark energy are currently derived after free functions in the models are parametrised in 
simple ways. 

Alternatively, one can approach the problem in a different way and look for any de- 
viations from a cosmological constant, irrespective of origin. The dark energy equation of 
state w{z) = p{z)/p{z) is typically constrained using distance measurements as a function of 
redshift. The luminosity distance may be written as 

d^(^) = ^(l + |Lsinfy^ rdz'-?°-Y (1.1) 

where H{z) is given by the Friedmann equation, 

H{zf = Hi |o^(l + zf + Ofc(l + zf + (1 - - Qk) exp 

where Hq = H(z = 0) and ^rn,k are the normalised density parameters. Without a model for 
dark energy it is difficult to use other observations as these require perturbations at some level, 
so distances are the vital observable. A common procedure here is to postulate a multiple 
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parameter form for w{z) and calculate dL{z). The most promising of these approaches uses 
a principal component analysis to construct the 'optimal' basis functions for w{z) based on 
the data available [2-4]. Another uses Gaussian Processes to effectively smooth w and fit it 
to data [5] (we discuss this work below). 

An alternative method is to reconstruct w{z) by directly reconstructing the luminosity- 
distance curve. Writing D{z) = {Hq/c){1 + z)~^dL{z) as the normalised comoving distance, 
we have [7-10]: 

^ 2(1 + + ^kD^)D" - [(1 + zf^kD'^ + 2(1 + z)nkDD' - 3(1 + nkD^)]D' 

3{(1 + Z)2[0, + (1 + z)nrn]D'^ - (1 + nkD^)}D' ^ ■ ' 

Thus, given a distance-redshift curve D(z), we can reconstruct the dark energy equation of 
state, assuming we know the density parameters 0^ and ilk- Different methods for doing 
this involve smoothing the data to give D(z), or parametrising D{z) by a function; see [6] 
for a comprehensive review, and [11-19] for alternative model independent approaches. 

The direct reconstruction method is unstable because of the two derivatives of the 
observed function in Eq. (1.3), requiring the fitting function to accurately capture the slope 
and concavity of luminosity distance curve. This means that differences between the true 
underlying model and the fitted function due to the choice of parametrisation are amplified 
drastically when reconstructing w. Furthermore, w is constructed from a quotient of functions 
which need to balance to obtain the correct w; the denominator can easily pass through zero 
for even small uncertainties making it especially unstable. Typical reconstruction methods 
usually appear to flounder even at moderate redshift for this reason. However, such problems 
may be considered as informing us about the real errors on w without the intrinsic priors 
which arise when we start from a parametric form of w and constrain it from there. 

Indeed, even small errors in the parameters can lead to large errors in w{z). This 
can be seen in the left panel of Fig. 1, where we have assumed that we know D{z), D'(z), 
D"{z) and exactly, and have used the WMAP7 constraints on the matter density, = 
0.275 zb 0.016 [20]. A similar effect happens from uncertainties in the curvature. Even under 
these idealised conditions, the errors of the reconstructed w{z) are large because they properly 
take into account the degeneracies with the density parameters [21-24]. 

Nevertheless, such approaches play an important role in our understanding of dark 
energy for several reasons. A simple one is that w{z) may only be a place-holder phenomeno- 
logical function supposed to encapsulate all possible alternatives to dark energy, such as 
modified gravity [25], and not just simple quintessence models. This means that we have to 
accept there could be really unexpected behaviour. An important example of this happens if 
/9efr passes through zero at some z while pefr remains non-zero, then w{z) has a pole at z. To 
evaluate the integral appearing in H[z) requires integration around the pole, and the residue 
of the integral taken into account (assuming it is defined). No method which starts from a 
set of basis functions for w{z) can recover this behaviour if it is not known a priori. 

Consider the explanation for dark energy suggested by causal set theory [26] wherein 
we should expect the value of A to stochastically fluctuate around the Hubble scale. In that 
case, w would be discontinuous and varying widely over short redshift scales, but this can 
never be represented by choosing simple functions in tw-space. 

Because of these reasons, methods which start from w and work towards D{z) can 
underestimate the errors in our understanding of dark energy. The errors which appear to 
condemn reconstruction methods which start from D{z) and work towards w are actually 
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Figure 1: Left: Dark energy reconstruction using Eq. (1.3) for 0,^ = 0, when D{z) and 
its derivatives are exactly known (ACDM with = 0.275). The uncertainty in the recon- 
structed w (the blue shaded regions show the 68% and 95% CL) only comes from the prior 
on the matter density, 0^ = 0.275 ± 0.016 (WMAP7 [20]). Right: The same plot for fixed 
matter density, = 0.275, and a prior on the curvature of ilk = 0.00 it 0.01. 



much more representative of our genuine errors in this regard. It is for this reason that it is 
important to pursue reconstruction methods, even though they are very challenging. 

In this work, we use Gaussian processes for the reconstruction of D{z) and its derivatives. 
Then equation (1.3) is used to determine w{z). Gaussian processes describe a distribution 
over functions and are thus a generalisation of Gaussian distributions to function space. The 
analysis is fully Bayesian; we start with a prior for the function distribution and combine it 
with the likelihood of observing the data, given that distribution. This leads to a posterior 
function distribution. 

Gaussian processes in combination with MCMC methods have previously been used to 
reconstruct w(z) by Holsclaw et al. [5, 27]. While their method uses integration over w{z) 
to obtain the distance, we reconstruct D{z) and its derivatives in order to determine w{z). 
Gaussian processes typically have an implicit prior favoring smooth functions. This is closely 
related to the preference of simpler models in Bayesian model selection. So in our approach 
we have a smoothness prior on our distance data, whereas, by applying the GP to w{z) 
directly the smoothness prior is rather different in [5, 27]. As we shall see we find different 
results at high redshift. A comparison of the two methods will be given in Section 4. 

The outline of the paper is as follows: We start with an introduction to Gaussian 
processes, including a performance test for the reconstruction of a function and its derivatives. 
In Section 3, Gaussian processes are used to reconstruct H{z), q{z) and w{z) for a mock SN 
data set and for the Union2.1 data set [28]. The results are discussed in Section 4. 

2 Gaussian Processes 

In this section, we summarise the Gaussian process algorithm, which can perform a recon- 
struction of a function from data without assuming a parametrisation of the function. We 
mainly follow the book by Rasmussen and Williams [29]. Other useful references may be 
found in [30, 31] and on the Gaussian Process webpage [32]. We have developed a code 
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for Gaussian processes called GaPP (Gaussian Processes in Python) which is available for 
download^. It can be used to reconstruct a function and its derivatives, from a given data 
set. 

Given a data set P of n observations: 

V = {{x,,yi)\i = l,...,n} (2.1) 

we would like to reconstruct a function f[x) that describes the data. We write the set of 
training inputs, i.e. the locations {xi}^^^ of the observations, as X. The locations, at which 
we want to reconstruct the function, are denoted as X* . 

2.1 Reconstructing a function 

A Gaussian process is the generalisation of a Gaussian distribution. While the latter is 
the distribution of a random variable, the Gaussian process describes a distribution over 
functions. Consider a function / formed from a Gaussian process. The value of / when 
evaluated at a point a; is a Gaussian random variable with mean /i(x) and variance Var(j;). 
The function value at x is not independent of the function value at some other point x 
(especially when x and x are close to each other), but is related by a covariance function 
cov (/(x), /(x)) = k{x,x). Thus, the distribution of functions can be described by the fol- 
lowing quantities: 

fi{x) = E[fix)] , (2.2) 
k{x, x) = E[(/(x) - /x(x))(/(x) - , (2.3) 

Var(a;) = k{x,x) . (2.4) 

The Gaussian process is written as 

f{x)^gV{fiix),k{x,S:)) . (2.5) 

There is a wide range of possible covariance functions. While one will often chose 
covariance functions that only depend on the distance between the input points \x — x\, 
this is not a necessary requirement. Throughout this work, we use the squared exponential 
covariance function: ^ 

k{x, i) = a} exp ^^^^) . (2.6) 

This function has the advantage that it is infinitely differentiable, which is useful for recon- 
structing the derivative of a function. The squared exponential covariance function depends 
on the two 'hyperparameters' aj and i. In contrast to actual parameters, the hyperparame- 
ters do not specify the form of a function. Instead they characterize the "bumpiness" of the 
function. The characteristic length scale £ can be thought of as the distance one has to travel 
in x-direction to get a significant change in /(x), whereas the signal variance aj denotes the 
typical change in the y-direction. 

For a set of input points X = {xi}, the covariance matrix K{X,X) is given by 
[K{X , X)]ij = k(xi,Xj). Even without any observations, one can use the covariance ma- 
trix to generate a random function f{x) from the Gaussian process, i.e. one generates a 
Gaussian vector /* of function values at X* with /* = f{x*): 

f* ^J^{fx*,K{X*,X*)) , (2.7) 

^http: //www . acgc .uct . ac . za/~ seikel/GAPP/ index.html 
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where fi* is the a priori assumed mean of /*. The notation J\f means the Gaussian process 
QV is evaluated at specific points x* , where f{x*) is a random value drawn from a normal 
distribution. As the function is not restricted by any observations, it is quite arbitrary. Its 
values at different locations x* are however correlated by the covariance function. This can 
be considered as a prior on the choice of output functions; only when we add in data at other 
points Xi does the output become constrained further. 

Observational data {xi,yi) can also be described by a Gaussian process, assuming the 
errors are Gaussian. The actual observations are assumed to be scattered around the under- 
lying function, i.e. yi = f{xi) + ei, where Gaussian noise with variance af is assumed. This 
variance has to be added to the covariance matrix: 



y^M{ti,K{X,X) + C) 



(2. 



where C is the covariance matrix of the data. For uncorrelated data we have simply C = 
diag((T?). The two Gaussian processes for /* and y can be combined in the joint distribution: 



y 








f*. 







K{X,X) + C K{X,X*)' 
K{X*,X) K{X*,X*) 



(2.9) 



While y is known from observations, we want to reconstruct /*. This can be done with 
the conditional distribution (see the Appendix) 



where 
and 



r\X*,X,y^M{f*,covir)) , (2.10) 

r = + KiX*,X) [KiX, X) + Cr' {y - /x) (2.11) 
cov(/*) = K{X*,X*) - K{X*,X) [K{X, X) + C]~^ K{X, X*) (2.12) 



are the mean and covariance of /*, respectively. The variance of /* is simply the diagonal of 
cov(/*). Eq. (2.10) is the posterior distribution of the function given the data and the prior 
Eq. (2.7). 

In order to be able to use the above equations for reconstructing a function, we still need 
to know the hyperparameters ctj and £. They can be trained by maximizing the marginal 
likelihood, which is the marginalization over function values / at locations X: 



p{y\X,af,£) = I p{y\f,X)p{f\X,af,£)df . 



(2.13) 



Note that the marginal likelihood only depends on the locations X of the observations, but 
not on the points X* , where we want to reconstruct the function. 

With a Gaussian prior f\X,af,£ ^ Af{fx, K{X , X)) and with y\f ~ Af{f,C), the 
integration of (2.13) yields the log marginal likelihood 



ln£ = lnp{y\X , af 



^{y- [K{X, X) + cr^(y-tj,)-^ ln\K{X, X) + C\ - ^ln2n . (2.14) 



The hyperparameters aj and i can now be optimized by maximizing equation (2.14). 

In a completely Bayesian analysis, one should marginalise over the hyperparameters 
instead of optimising them. This can be done with MCMC methods. However in most 
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(a) prior (b) posterior 



Figure 2: Left: Prior of the Gaussian process as given by Eq. (2.7). Note that the hyper- 
parameters have not been trained yet. Right: Posterior of the Gaussian process as given by 
Eq. (2.10). 



cases, the log marginal likelihood is sharply peaked. Thus, the optimisation is a very good 
approximation of the marginalisation. 

Figure 2 shows an example for the prior and the posterior of a Gaussian process, where 
we have assumed a zero a priori mean function fi{x) = 0. For the prior, the hyperparameters 
have not been trained yet. Thus, the scale of the y-axis is not fixed and all functions are still 
possible. Adding data constrains the function space, which can be seen in the plot for the 
posterior. 

How to GP 

The key steps involved in constructing the GV are: 

• Choose your data set. 

• Choose a covariance function. 

• Choose points x* where we want the function /* to be estimated. 

• Decide a prior mean function /i(x). fi = const, is a safe choice. 

• Train the hyperparameters: 

— Carefully choose initial values for the hyperparameters. It is recommended to try 
different initial values because sometimes the optimisation of the hyperparameters 
can get stuck in a local maximum. 

— Maximise the likelihood function, Eq. (2.14), for the hyperparameters. 

• Calculate /* from Eq. (2.11). This gives the expected value function. 

• Calculate the diagonal elements of cov(/*), Eq. (2.12). This gives the variance of /*. 
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2.2 Reconstructing the derivative of a function 



This method can also be used to reconstruct derivatives of f{x) as the derivative of a Gaussian 
process is again a Gaussian process. While the covariance between the observational points 
stays the same, one also needs a covariance between the function and its derivative and 
one between the derivatives for the reconstruction. These covariances can be obtained by 
differentiating the original covariance function: 



GOV fi, 



dfj \ _ dk{xi,Xj) 



dx 



dxj 



(2.15) 



and 



cov 



(2.16) 



dfj dfj \ _ d'^k{xi,Xj) 
dxi ' dxj J dxidxj 

Covariances for higher derivatives of / are calculated analogously. 

Given the Gaussian process for f{x), the Gaussian processes for the first and second 
derivative are consequently given by: 



fix) ~ gV {ii{x),k{x,x)) 

d'^k{x, x) 

dx dx 
d'^k{x, x] 



fix) ~ gv (^//(x) 
fix) ~ gv (f,"ix] 



dx^ dx^ 



(2.17) 
(2.18) 

(2.19) 



In the following, we only show the procedure for reconstructing the first derivative of /. 
Reconstructions of higher derivatives are done analogously. The joint distribution of y and 
is 

KiX,X) + C K'iX,X*) ' 
K'iX*,X) K"iX*,X*) 



r' 



y 

r' 



where 



and 



[K'iX,X*)\, 



dkixi,x*) 
dx* 



[K"iX*,X*)] 



d^kix*,x*) 



dx* dx* 



(2.20) 
(2.21) 

(2.22) 



K'iX*,X) is the transpose of K'iX, X*). 

Then the conditional distribution is given by: 



f*'\X*,X,y^M(f*',covif*') 



where 



f*' = + K'iX*,X) [KiX, X) + iy - fx) 
cov(/*') = K"iX*,X*) - K'iX*,X) [KiX,X) + C]~^ K'iX,X* 



(2.23) 

(2.24) 
(2.25) 



The hyperparameters are trained in the same way as for the reconstruction oi fix) since 
the marginal likelihood (2.14) that has to be maximized only depends on the observations, 
but not on the function we want to reconstruct. 
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2.3 Combining f{x) and its derivatives 

Often we are not only interested in reconstructing f{x) and its derivatives, but also in cal- 
culating functions g{f{x), f'{x), . . . ), which depend on the function derived by the Gaussian 
process. Then we need to know the covariances between /* = f{x*), /*' = f'{x*) ... at each 
point X* where g is to be reconstructed. These covariances are given by: 



cov(rw,/ 



*(«) f*{j)) 



X , X 



k'^^{x\X) [K{X,X) + C]-'^ K^^\x,x*) 



where J**-*^ is the ith derivative of /*. k^'^'^\x* ,x*) means that k{x*,x*) is derived i times 
with respect to the first argument and j times with respect to the second argument. 

At each point x* , g* = g{f* , /*' . . . ) is then determined by Monte Carlo sampling, where 
in each step /*, /*' . . . are drawn from a multivariate normal distribution: 



7*" 










/*' 





















var(/*) cov(/*,/*') 
cov(/*,/*') var(/*') 



(2.26) 



Instead of Monte Carlo sampling, one might use propagation of errors to determine the 
confidence levels of g{f{x),f'{x),...). This is however a first order approximation and is 
thus only recommended if the errors are small, which is usually not the case when higher 
derivatives are involved. 



2.4 Performance of the reconstruction 

Theoretically, the true function value at a point x* lies between the 1-a limits of the recon- 
structed function (i.e. between f{x*) — y^Var(2;*) and f{x*) + y^Var(x*)) with a probability 
of 68%. Thus, when reconstructing a function over an interval in x direction, one would 
expect the true function to lie between the 1-a limits within 68% of that interval range. Note 
that this is only the expectation value. For one specific reconstruction this percentage might 
be higher or lower. 

In this section, we analyse how this percentage depends on the function that we want to 
reconstruct. It is a reasonable assumption that - given the same amount of data - functions 
that change very rapidly are more difficult to reconstruct than smooth functions. In order 
to test this assumption we consider different superpositions of sin-waves: 

N 

/W(x) = ^|sin(6,x), (2.27) 

i * 

where N is the number of superpositions. The Oj are random numbers between 0.5 and 1, and 
with random sign. The bi are approximately equal to i. (We have not used the exact equality, 
because that would have made it easier for the Gaussian Process to recognise the frequencies.) 
We have chosen functions in which the high frequency terms are suppressed compared to the 
low frequency ones as this provides the most difficult test for a data smoothing techniques 
ability to reconstruct derivatives, and is closest to our problem at hand. While in the second 
derivative of f^^\x) the amplitudes of the different frequencies have the same order of 
magnitude, higher frequencies are suppressed in the function itself and (to a lesser degree) 
in the first derivative. This is shown in Fig. 3 for = 10. The function is much smoother 
than its derivatives. 
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Figure 3: f^^'^^x) from equation (2.27) and its first and second derivative. 

We have then repeatedly created mock data scattered around f^^\x), = 1 — 10, with 
varying noise {a = 0.05,0.1,0.3,0.5) and varying number of data (n = 20,50,70,100,200). 
From each mock data set, we have reconstructed the function and its derivatives using Gaus- 
sian Processes. We then determined the fractions of the range interval, where the true 
functions f^^\x), f^^^'{x) and f^^^"{x) lie between the la and 2a limits of their respective 
reconstructions. The result is shown in Fig. 4. Each point corresponds to the average over 
200 realizations of the mock data set. 

The reconstruction of f^^\x) gives results that are quite close to the expected values. 
However, when high frequencies are involved, the errors of the reconstructed function are 
slightly underestimated, leading to a smaller fraction of the true function lying between the 
reconstructed la and 2a limits. When the Gaussian Process fails to recognise high frequen- 
cies, this only has a small effect on the reconstruction of f^^\x), as the high frequencies are 
suppressed. However, this effect becomes large for the derivatives f^^^'{x) and f^^^"(x). 

We could be faced with this problem when reconstructing w{z). If w{z) had high 
frequency contributions they would be suppressed in the luminosity distance because one has 
to integrate twice to obtain diiz) from w^z) using equations (1.1) and (1.2). 

On the other hand, the errors in f^^^'{x) and f^^^"{x) are overestimated in the absence 
of high frequency terms. Thus, the errors of the reconstruction reflect our lack of knowledge 
about the frequencies that contribute to the differentiated function. Oscillations that are 
present in the derivatives are in general smoothed by integration and therefore hard to 
detect in the function. The Gaussian process accounts for that ignorance by choosing a 
balance between the cases where high frequencies are present or absent. 

This balance is achieved by a weighted average over function space, where more weight 
is on smoother functions. These represent simpler models, which are preferred in Bayesian 
analysis. This can be seen when we perform the average over the vector of function values 
/* (see Eq. (A. 12)). These function values are not independent from each other, but are 
linked by covariances. When we consider two points j;^ and Xg, then similar values of /(x^) 
and f{x2) are preferred, unless observational evidence indicates a difference in their values. 
Consequently, the smoothest functions that are consistent with the observations are preferred 
to functions with higher frequencies. 

3 Reconstruction of H{z), q{z) and w{z) 

There are many functions of the distance data which provide information about dark energy 
dynamics. We use, in addition to the distance and its first and second derivatives, the 
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Figure 4: The function f^^\x) from Eq. (2.27) and its first and second derivatives are 
reconstructed using a GP, for different numbers of superpositions of sine waves N. We show 
the fractions of the range interval where the true function (top left) and its derivatives (first, 
top right; second, bottom left) a lie between the expected 1- and 2-a limits {1-a: red points, 
2-cr: blue points; the red and blue lines are the respective expectation values). Each point 
is the result of averaging over 200 mock data sets. Data sets with a smaller number of data 
are slightly shifted to the left, those with larger numbers to the right. 



Hubble rate, H{z), the deceleration parameter q{z), and w{z). We assume Qj, = here so 
H{z) provides information about the dark energy density without the degeneracy with 
Similarly, q{z) provides information about deviations from ACDM without this degeneracy 
too. 

3.1 Mock data 

We shall now use the GP method to smooth over a mock SNIa catalogue, and reconstruct 
the functions H{z), q{z) and w{z) for a given dark energy model. As we are using the 
dimensionless distance D{z), the analysis of the mock data set does not depend on the 
present Hubble rate Hq. 

The Dark Energy Survey (DES) - Supernova Survey [33] is expected to obtain high 
quality light-curves for about 4000 SNe la up to redshift z = 1.2 in the next five years. We 
use the anticipated redshift histogram given in [33], to create mock data sets for different 
dark energy models: ACDM and a model with slowly evolving w{z). The aim of this section 
is to determine if the Gaussian process can recover the correct behaviours of the respective 
models. 
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In the following, we will assume a flat universe, $7^ = 0. Then the Hubble rate is given 

by 

= ^ (3^1) 

and the deceleration parameter by 

D"(z'\ 

q{z) = -^{l + z)-l. (3.2) 

For the reconstruction of w{z), we set Vim = 0.3, which is the same matter density used to 
create the mock data sets. 

3.1.1 ACDM 

We start with a ACDM model with Vtm = 0.3. We created a mock data set according to 
anticipated results from DES and used Gaussian processes (with fi[z) = 0) to reconstruct 
the distance D[z) and its derivatives. The result is shown in Fig. 5. The blue line shows the 
mean of the reconstruction and the shaded areas its 68% and 95% CL. D{z) is reconstructed 
with very high precision within the redshift range of the data. At higher redshifts the errors 
on the reconstruction increase, which is an expected behaviour. For the first derivative, the 
point where the errors start to increase significantly is shifted to lower redshifts. 

The distance-redshift relation D{z) of the ACDM model (red line) and its derivatives 
are reconstructed nicely by the Gaussian process. 

As the Hubble rate H{z)/Hq is simply the inverse of D'{z) (when assuming Vik = 0), 
its reconstruction is also very precise. This is shown in Fig. 6. 

The formula for the reconstruction of q{z) is slightly more complex and contains the 
first and second derivatives of the distance [see Eq. (3.2)]. This leads to larger errors at high 
redshifts. At low redshifts however, we get tight error bars - see Fig. 6. 

The reconstruction of the equation of state w{z) requires a more complicated formula 
(1.3). Especially when the true value of the denominator is close to zero, small errors in the 
distance can lead to large errors in w{z). In fact it can be seen in Fig. 6 that the reconstruction 
errors explode at redshifts z > 0.7. While the true model lies within the reconstructed 
95% CL, a large variety of evolving dark energy models would also be consistent with this 
reconstruction. 

3.1.2 Evolving w 

Next, we test a model with a slowly evolving equation of state: 



w{z) = - ( — 1 + tanh 



(3.3) 



The results are shown in Fig. 7. The Gaussian process capture the model (black line) correctly 
for H{z), q{z) and 'w{z). Also shown is a ACDM model, which is not consistent with the 
reconstruction. Thus, the Gaussian process can correctly distinguish between these two 
models, assuming that we know the matter density accurately enough. 
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Figure 5: Reconstruction of D(z), D'{z) and D"{z) obtained from a mock data set following 
DES specifications and assuming a ACDM model with $7^ = 0.3 (red line). The shaded blue 
regions are the 68% and 95% CL of the reconstruction. 



3.2 Union2.1 SNIa data 

In this section, we apply the Gaussian process to real SN la data We use the Union2.1 data 
set [28], which contains 580 SNe la. We transformed the distance modulus m — M given in 
the data set to D using 

m - M + 5 log - 25 = 5 log((l + z)D) (3.4) 

with Hq = 70km/ (s Mpc). Note that the values of D do not depend on Hq itself, but on 
a combination of Hq and the absolute magnitude M, which can be written as = M — 
5 log {Hq/c)+25. To account for various systematics in the data set - such as the calibration of 
the absolute magnitude - we use the full covariance matrix for the data. The Gaussian process 
analysis implicitly assumes that the errors in D follow a Gaussian distribution. However, 
(relatively) small deviations from Gaussianity do not affect the result of the Gaussian process 
significantly. 

The reconstruction of the distance is shown in Fig. 8. As expected the errors of the 
reconstruction are larger than determined in 3.1 for the upcoming DES survey due the smaller 
number of SNe la and larger measurement errors. The reconstructions for the distance, as 
well as for H{z), q{z) and w^z) (cf. Fig. 9) are consistent with ACDM. For the reconstruction 
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Figure 6: Reconstruction of H{z) (top left), q{z) (top rigtit) and w{z) (bottom left) obtained 
from a mock data set following DES specifications and assuming a ACDM model with = 
0.3 (red line). The shaded blue regions are the 68% and 95% CL of the reconstruction. The 
errors on w{z) correctly blow up past z ~ 1 where there is limited data, and the effect of w 
on distances is suppressed. The bottom right plot shows the widths of the 68% and 95% CL 
regions. 



of w{z) we have assumed Qm = 0.270 it 0.015, which is the constraint on the matter density 
for a flat universe with time dependent equation of state given in [28]. 

4 Discussion 

We have presented a new approach to reconstructing the dark energy equation of state using 
Gaussian processes. We use the GP to smooth the data, and to produce estimates of the first 
and second derivatives of the distance data. This is then combined to give w{z), and any 
other function of the background cosmology we might be interested in - we have considered 
H{z) and q{z). This approach performs extremely well at low and moderate redshift - i.e., 
when dark energy affects the global cosmological dynamics (recall we do not assume a CMB 
distance prior). We have shown that DES can recover H{z) to sub-percent accuracy, and 
w{z) to a few percent. Larger errors at high redshift simply reflect a lack of data there. 

We have applied our analysis also to the Union2.1 supernova set [28]. The results are 
consistent with ACDM. But note that the distance moduli in this data set were derived 
in a model dependent way; the cosmological model is fitted at the same time as some of 
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Figure 7: Reconstruction of H{z), q(z) and w{z) obtained from a mock data set following 
DES specifications and assuming a model with w{z) = 0.5(— l+tanli[3(2;— 0.5)]) and 0^ = 0.3 
(red line). The shaded blue regions are the 68% and 95% CL of the reconstruction. The 
bottom right plot shows the widths of the 68% and 95% CL regions. 



the nuisance parameters. For a fully self-consistent analysis, we would need to include the 
derivation of the distance moduli directly into our Gaussian process analysis. This is beyond 
the scope of the present work. While it would certainly be interesting to perform a more 
realistic analysis in future work, the present paper already shows that Gaussian processes 
are a powerful analysis tool. 

Yet, one has to be careful when interpreting the results. As we have shown in section 2.4, 
the goodness of the reconstruction depends on the smoothness of the true function and 
the quality of the data. High frequency terms in ■w{z) are very hard to detect by distance 
measurements. If such terms were present, we would not be able to see them with the present 
Union2.1 SNe, nor with the future DES SNe. A slowly evolving equation of state is however 
often captured within the 68% CL, i.e the errors are overestimated. The reconstructed errors 
represent our lack of knowledge about the smoothness of the function. The Gaussian process 
automatically determines the errors such that a balance between very smooth functions and 
rapidly oscillating functions is obtained. 

While we started with the reconstruction of D{z) using Gaussian Processes and sub- 
sequently determined wi^z), Holsclaw et al. [5] followed a different approach. They use a 
GP-based MCMC algorithm. Starting with initial values for the hyperparameters and for 
the vector w (containing values of w{z) at different redshifts), they perform the first inte- 
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Figure 8: Reconstruction of D{z), D'{z) and D"{z) obtained from the Union2.1 SN data 
set. The shaded blue regions are the 68% and 95% CL of the reconstruction. Also shown is 
a ACDM model with 0.^ = 0.27. 



gral over the Gaussian Process of w analytically and the second one numerically, and finally 
calculate the distance modulus. The hyperparameters and w are then varied in each step of 
the MCMC and the resulting distance modulus is compared to observational data from the 
Constitution set [34]. 

Using this method, the errors on w{z) are much more uniform than in our approach. 
Our errors are smaller at low redshifts and larger at high redshifts. In [5], the errors do not 
depend strongly on redshift - even at high redshifts, where the data density is significantly 
smaller, and the effect of dark energy on the cosmological dynamics very weak. In fact, their 
errors on w are smaller than those induced from uncertainties in ilrn (note that they use 
broader priors than we have assumed in the left panel of Fig. 1). Their method is optimised 
towards smooth 'w{z); this preference is much stronger than in our approach, which prefers 
smooth distances instead. This is similar for other approaches which focus on w{z) (see 
e.g., [4]). Note, however, that we use different data assumptions, and do not assume a CMB 
distance prior. 

Of course, methods which start form w provide valuable constraints on dark energy, 
and we do not advocate reconstruction as a replacement of such approaches. Instead they 
should be considered as complementary as it helps us understand how differing priors used 
in the construction of w{z) affects our final result. In addition, by not assuming that the 
dark energy equation of state has physical significance, a reconstruction approach allows 
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Figure 9: Reconstruction of H(z), q{z) and w{z) obtained from tlie Union2.1 SN data set. 
Tlie shaded blue regions are the 68% and 95% CL of the reconstruction. Also shown is a 
ACDM model with = 0.27. The reconstruction of w{z) is obtained assuming = 
0.27 ± 0.015 (constraints taken from [28]). 



us to consider more general models where the effective w{z) is ill defined. Constraints on 
the expansion dynamics are readily obtained without invoking dark energy models at all. 
Furthermore it is readily used for non-standard models, and consistency tests of the FLRW 
models themselves, as we shall consider in future work. 
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A Conditional distribution of the multivariate normal distribution 



Starting from the joint distribution for y and /*, we want to calculate the conditional dis- 
tribution f*\y. The joint distribution is given by 



y 

r 



K K* 

K*^ K** 



(A.l) 



where we have used the abbreviations K = K{X,X) + C, K* = K{X,X*) and K** = 
K{X* , X*). Denoting the combined covariance matrix as K, the joint probability distribu- 
tion is: 



p{y,r 



(27r)^/Vdet(if) 



1 



K- 



y- fj- 



(A.2) 



with iV = n -|- n*, where n and n* are the dimensions of y and /*, respectively. 
The determinant for block matrices is given by 



det{K) = det{K) det{K** - K*'^K**-^K*) 



and the inverse by 



where 



K- 



Mii Mi2 
M21 M22 



M. 



-k-^K*{K** - K*'^k~^K*)-^ 
Using the matrix inversion lemma, we can write Mn as 

Mil = k-^ + k-^K*{K** - K*^k-^K*)-^K*'^k-^ 



Mil 
M12 



21 



(A.3) 
(A.4) 

(A.5) 
(A.6) 
(A.7) 

(A.8) 



Inserting these results into equation (A.2), we get for the joint probability distribution 



p{y,r) 



: exp 



(2^)"/2-y/det(K) 
1 



hy-^,fk"Hy-^^) 



+ 



(27r)"*/2y/d^^(A) 



exp --(f*-bfA-\r-b) 



with 



b = fi* + K*^k-\y - ^l) 
A = K** - K*'^k-^K* 



The marginal probability distribution of y is 

p{y) = j p{yJ*)dr 



(A.9) 



(A.IO) 
(A.ll) 



(A.12) 



: exp 



(27r)"/2./det(i^) 



.-[y-^,fk-\y-^l) 
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and the conditional probability distribution 

p{y,f*) 



p{r\y) 



p{y) 



1 

; exp 



l(f*-bfA-Hr-b) 



(2^)™*/2ydit(:4) 

= Mib,A) (A.13) 
Note that b and A are equal to /* and cov(/*) of equation (2.10). 
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